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Regulatory interactions between genes show a large amount of cross-species variability, even when 
the underlying functions are conserved: There are many ways to achieve the same function. Here we 
investigate the ability of regulatory networks to reproduce given expression levels within a simple 
model of gene regulation. We find an exponentially large space of regulatory networks compatible 
with a given set of expression levels, giving rise to an extensive entropy of networks. Typical 
realisations of regulatory networks are found to share a bias towards symmetric interactions, in line 
with empirical evidence. 
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The expression of genes is regulated such that the right 
combinations of gene products are generated at the right 
time and place of an organism. Key regulators of gene 
expression are transcription factors, proteins which bind 
to specific sites on DNA and influence the expression of 
nearby genes. Typically, the expression of a gene is ef- 
fected by a combination of several transcription factors, 
and conversely, a transcription factor regulates several 
genes. Expression levels can thus depend on the entire 
set of regulatory interaction between transcription fac- 
tors and their target genes, referred to as a regulatory 
network. These intracellular reaction networks process 
extracellular information to induce specific gene expres- 
sion patterns, allowing, for instance, the development of 
a complex body plan, or responses to external conditions. 

Even though regulatory networks are tuned carefully to 
produce specific expression patterns, there are in general 
many networks fulfilling a regulatory task. An example is 
the control of mating type in different yeast species: The 
same set of genes controlled in S. cerevisiae by an acti- 
vator which is upregulated in a certain state is controlled 
by a repressor which is downregulated in that state in C. 
albicans [I]. A second prominent example is the develop- 
ment of the anterior patterning in insect embryos, leading 
to the formation of the insect's head. The gene crucial to 
this process in the fruit fly Drosophila, called bicoid, is 
absent in many other insects, where a combination of dif- 
ferent genes take on the same task [2]. Even whole sets 
of genes which are co-expressed across the entire yeast 
family can have different regulatory interactions in dif- 
ferent species [S] . Source of these changing interactions 
is a rapid evolutionary turnover of transcription factor 
binding sites at the level of DNA sequences jU |5] . This 
can generate new regulatory interactions. A recent essay 
on the degeneracy of regulatory networks can be found 
in ig. 

The large number of regulatory networks with a given 
function (viable networks) is particularly relevant from 
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Figure 1: Expression levels and regulatory networks. 

a) We list genes along the x-axis, and states of the organism 
along the j/-axis. Following established convention in expres- 
sion analysis, expression levels are colour-coded with high ex- 
pression levels shown in red (dark), low levels in green (light). 

b) Regulatory interactions must be compatible with gene ex- 
pression levels in all states of the organism. The schematic 
example shows interactions between transcription factors and 
a single target gene; two enhancing interactions {—f) with up- 
regulated transcription factors, and a repressive interaction 
(H) with a downregulated transcription factor lead to the ac- 
tivation of the target gene. 



an evolutionary perspective, as neutral evolution gradu- 
ally explores different viable networks. Viable networks 
can form a set with intricate geometric features in the 
space of regulatory network. An analogy is the set of all 
RNA-sequences which fold into a given secondary struc- 
ture, which stretches across the entire sequence space [7]- 
Numerical studies, based on simple models of gene reg- 
ulation [8j found that the space of viable networks can 
be crossed in small steps, and that a wide range of new 
expression patterns can be generated by small changes to 
different viable networks [SJ HH] ■ 

These observations call for a statistical approach based 
on the ensemble of all viable networks, which is the topic 
of this paper. We consider a model with two classes of 
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genes: structural genes (coding e.g. for enzymes or cellu- 
lar components), and transcription factors. The expres- 
sion levels of structural genes are prescribed for different 
states of the organism and are fixed for a given state. 
For instance, when nutrients are available, specific en- 
zymes have to be produced to digest these nutrients. On 
the other hand, the expression levels of transcription fac- 
tors, and the regulatory interactions between genes can 
be adapted to meet the expression levels of structural 
genes. The freedom to alter expression levels of tran- 
scription factors turns out to be crucial. 

In the following, we develop a simple model based on 
quenched random expression levels of structural genes, 
and adaptive regulatory interactions and expression lev- 
els of transcription factors. The ensemble of viable net- 
works is characterized by the microcanonical partition 
function 
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giving the fraction of viable networks in terms of the trace 
Trj^^j over the phase space (regulatory interactions J and 
the expression levels of transcription factors ^t) and an 
indicator function /(J,^) of coupUngs and all expression 
levels. The indicator function is defined to equal one for 
a viable network and zero otherwise. 

Specifically, genes are labelled i = 1 , . . . , for struc- 
tural genes and i = N + 1, . . . , (3N for transcription fac- 
tors. The regulatory network is encoded in a matrix of 
regulatory interactions Jij, with positive Jij indicating 
that gene j > N produces a transcription factor which 
enhances the expression of gene i j, and represses gene 
i for negative values of Jij. Different external and inter- 
nal states of the organism are labelled by /i = 1, . . . , aN. 

denotes the (log-)expression level of gene i in state fi, 
and is positive for high concentrations of the gene prod- 
uct and negative for low concentrations. Assuming tran- 
scription factors act independently on their target genes, 
the condition for a viable network is modelled as 
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Threshold condition (|2| has been used extensively to 
model neural [TT] and gene regulatory networks [HI [101 
[T^ [T^ . The indicator function for a viable network in 
the partition function ([T]) can be written in terms of the 
Heaviside step- function 0(x) as 
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We constrain vectors of regulatory interactions J.; and 
of expression levels to lie on hyperspheres. This defines 
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with d^(J,) = I\J>NdJ^3S (^{I3-1)N ~J2j>N J?j) and 
analogously for the transcription factor expression lev- 
els. The quenched average of ([ij over expression levels 
of structural genes ((InZ)) = nj<Ar/ dij{^i)lnZ is per- 
formed using the replica trick. 





Figure 2: Averaging over expression levels, a) The dia- 
grams corresponding to the first three terms in ([5| are shown 
along with their combinatorial factors. Nodes represent vari- 
ables solid lines indicate the corresponding matrix 
entries dj, dashed lines are contractions i — j. b) Plotting 
the logarithm of ([5| against 7 shows the contribution of dif- 
ferent diagrams. The first diagram gives a linear term (thin 
solid line), the series up to second and third order are shown 
by the dashed and dashed-dotted curves respectively. These 
are valid approximations up to some finite values of 7 only. 
The thick solid line gives the full series to infinite order (thick 
solid line) along with a numerical computation of ([sjl, where 
Gij was taken a random matrix of size A'^ = 50 with i.i.d. 
normally distributed elements. 



Transcription factors play a special role; their expres- 
sion levels provide the regulatory input for every gene in 
the regulatory network. This produces an effective cou- 
pling between regulatory interactions of different genes. 
One consequence emerges already at the level of the av- 
erage of (§ over the expression levels ^. As an illus- 
tration, we consider a toy problem, where the average 
of exp{—iy^-f/{2N) £,iGij^j} is computed over a dis- 
tribution of independent normally distributed variables 
^i. Gij is a symmetric matrix with uncorrelated random 
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entries. 
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with shorthand z.^ = V-^Ej ^ij^ji denoting the 

matrix trace. The successive terms in the power series 
([5]) can be represented diagrammatically; Fig. [2] a) shows 
the first three diagrams. Fig.[2]b) shows how the different 
powers in ([5| contribute to the average and how for finite 
values of 7 the series has to be taken to infinite order, 
giving w{z) = - i log (i + iymS). This 

is in contrast to the standard situation in fully-connected 
disordered models, where in the thermodynamic limit the 
series in ^ terminates after the first term. 

The approach ([s]) applied to the full model ([l])- 

3| gives Y.^<N,u + Y.^>N,n,a ^i^'D ^'^^'^ = 
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genes, the entropy of viable networks {{S)) = N'^s can 
be computed in the thermodynamic limit TV — > cx) by 
standard methods. Within a replica-symmetric ansatz 
we obtain 
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The entropy s of viable networks is determined 
by the extremum over the saddle point parameters 
F, Xi, Xi, X2 , X2 , and h. Saddle point parameter h 
has an intuitive interpretation in terms of the symme- 
try of regulatory interactions and will be discussed be- 
low. H{x) denotes the cumulative Gaussian measure 

The entropy of viable networks ^ decreases with in- 
creasing number of patterns P = aN, see Fig. [3j This is 
to be expected, as each set of expression patterns induces 
a new set of constraints on the network. However, the 
entropy remains finite even as the number of patterns 
becomes large with a — > 00: there (typically) always 



exists a viable network, and there is no transition to a 
phase where solutions of (|2| no longer exist. Such phase 
transitions are well known in neural networks and many 
combinatorial problems pjj. In contrast, the ability of 
regulatory networks to store a large number of expression 
patterns of structural genes stems directly from the free- 
dom to choose expression levels of transcription factors: 
transcription factor expression levels adapt in such a way 
that regulatory interactions compatible with expression 
levels of all genes can be found. 

The saddle point parameter h ~ jpiJ^^j^^i j JijJji 
is the symmetry parameter of the resulting regulatory 
network. A positive value of h indicates that if gene i 
regulates gene j, and also j regulates i, the signs of these 
interactions are correlated, with like signs occurring more 
frequently than opposite signs. The origin of this sym- 
metry lies in condition ([2]) for a viable network, where a 
positive value of ifS,'^ for some «, j, /J. gives rise to positive 
values for both Jij and Jji, and analogously for negative 
values. Thus the symmetry parameter h increases with 
the number of expression patterns; Fig. |3] shows the an- 
alytical result for h along with the outcome of numerical 
simulations. 

This statistical bias towards symmetric interactions is 
compatible with empirical data on regulatory networks. 
A literature search for well-documented cases of mutu- 
ally interacting genes with known interaction sign finds 
9 cases of mutually interacting gene pairs with like inter- 
action sign [Hj compared to only 3 cases with different 
sign [15]. A nontrivial statistics of reciprocal interac- 
^g-jtions has also been found in neural and metabolic net- 
works |16j . where, however, the signs of the interactions 
are generally unknown. 
/ Over long evolutionary timescales, the required expres- 
sion levels of structural genes can change. In the case of 
enzymes, for instance, changing nutrient availability or 
changing metabolic rates alter the required expression 
levels. Such changes of the expression levels of structural 
genes induce adaptive changes both of the regulatory net- 
work, and of the expression levels of transcription factors. 
To investigate the adaptation to changing expression lev- 
els of structural genes, we systematically perturb the ex- 
pression levels of structural genes of a viable network, 
rendering it, in general, at first unviable. (Expression lev- 
els Cf<jv are perturbed by adding i.i.d. Gaussian random 
variables with mean zero and standard deviation 77 and 
normalizing their variances to one again.) Subsequently, 
regulatory interactions and transcription factor expres- 
sion levels are adapted until the viability condition ^ 
is satisfied again. The overlap qf ^ J2i<N,tj. ^t^T 
structural gene expression levels of the unperturbed (un- 
primed) and the perturbed (primed) system quantifies 
the strength of the perturbation, the analogously defined 
,qy ^ quantify the response of the system to this 
perturbation. Figure |4] shows the overlaps as a function 
of perturbation strength. One finds that already small 
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Figure 3: Entropy and symmetry of viable networks. 

Witli increasing number of patterns P = aN, the space of 
viable networks shrinks, and the networks become increas- 
ingly symmetric, see text. Here the entropy s per structural 
gene (solid line) and symmetry parameter h (dashed line) are 
plotted against a for P = 2,k — 0. The B-symbols stem 
from numerical simulations with A*' = 80, averaged over 20 
realizations of the quenched disorder (mean and standard er- 
ror). The numerics is based on simulated annealing under 
Monte-Carlo dynamics of the regulatory interactions Jij and 
the expression levels of transcription factors Cr>jv- 
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Figure 4: Response to changing expression levels. The 

overlaps of perturbed and unperturbed systems (see text) are 
plotted against the perturbation strength rj: structural genes 
expression level overlaps (red dashed line) tend to zero with 
increasing by construction, whereas TF expression level 
overlaps (red solid line) quickly reach a plateau. The same 
holds for regulatory interactions to TF (black dotted line), 
and interactions to structural genes (black dash-dotted). The 
plateau value decreases with the fraction 1 — /3 of TF in the 
genome. The data stem from Monte-Carlo simulations with 
= 40, a = 1, P — 2, and k = 0, averaged over 20 samples. 



perturbations with « 1 result in a drop of the over- 
laps to a plateau value. Larger perturbations, and even 
the limit —* induce only a slow decay of ,qf, q^ 
from their plateau values. Accordingly, close to any vi- 
able network for one set of expression levels of structural 
genes, there exists a viable network for any other, even 
unrelated set of expression levels. This effect allows fast 
adaptation to changes in the required expression levels. 



Another consequence of the observed drop of the TF 
expression level overlap to a plateau is that expression 
levels of TF change more than those of structural genes 
for small perturbations. For large perturbations, the ex- 
pression levels of TF change less than those of structural 
genes. This effect may explain an apparent contradiction 
in the cross-species comparison of experimentally mea- 
sured expression levels. A comparison of humans with 
other primates shows large changes of TF expression lev- 
els |17j compared to structural genes, different Drosophila 
species show only small changes of TF expression levels 
compared to structural genes [T5] . 

In summary, we have investigated the degeneracy of 
regulatory networks within a simple model of genetic reg- 
ulation. Whereas the connection between annealed TF 
expression levels and the large space of viable networks is 
likely to persist also in more complex models, the geom- 
etry of this space may well change. In particular, models 
taking into account physical interactions between tran- 
scription factors to implement logical functions |13j lead 
to p-spin interactions Jijk... and may result in a discon- 
nected solution space and combinatorial complexity. 

Many thanks to A. Altland and M. Cosentino Lago- 
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knowledged under grant BE 2478/2-1 and SFB 680. 
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